fix(qemu): improve VM shutdown with graceful timeouts and PID safety by smoser · Pull Request #2479 · chainguard-dev/melange

smoser · 2026-04-13T20:44:03Z

Suppress expected ExitMissingError logs when VM powers off abruptly during SSH shutdown
Add graceful multi-stage shutdown: wait 5s for process to exit, then SIGTERM + 5s, then SIGKILL
Store *os.Process instead of raw PID to eliminate accidental signal delivery to reused PIDs
Guarantee QEMU process exits cleanly before returning from TerminatePod

Fixes the race condition where libvterm builds would show spurious ERRO/WARN messages, while also making shutdown more robust and safe.

- Suppress expected ExitMissingError logs when VM powers off abruptly during SSH shutdown - Add graceful multi-stage shutdown: wait 5s for process to exit, then SIGTERM + 5s, then SIGKILL - Store *os.Process instead of raw PID to eliminate accidental signal delivery to reused PIDs - Guarantee QEMU process exits cleanly before returning from TerminatePod Fixes the race condition where libvterm builds would show spurious ERRO/WARN messages, while also making shutdown more robust and safe. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…lled We should not train people or machines to ignore red ERROR messages. With this change and chainguard-dev#2479, we have zero ERROR log entries in a successful build. Previously RetrieveObservabilityEvents always sent three `test -f` SSH commands to probe for the observability events file, even when the hook was never installed. Each probe exits non-zero (file not found), causing sendSSHCommand to log ERROR three times for every build to the console. During CPIO generation, scan the base initramfs for the hook's sentinel file (etc/tetragon/tetragon.tp.d/network-monitor.yaml) and record the result in cfg.ObservabilityHook. This is accurate regardless of how the package got into the image — QEMU_ADDITIONAL_PACKAGES, QEMU_BASE_INITRAMFS, or any other mechanism. RetrieveObservabilityEvents returns immediately when ObservabilityHook is false, and treats a missing events file as an error when it is true. We can now also correctly ERROR when there _was_ a observability hook installed rather than just assuming it was not there. Store the result of that scan in a sidecar (<cpio>.observability) so we do not have to scan on cached initramfs. The sidecar is invalidated automatically when the CPIO is newer (fresh build, QEMU_ADDITIONAL_PACKAGES change, etc.). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

We should not train people or machines to ignore red ERROR messages. With this change and chainguard-dev#2479, we have zero ERROR log entries in a successful build. Previously RetrieveObservabilityEvents always sent three `test -f` SSH commands to probe for the observability events file, even when the hook was never installed. Each probe exits non-zero (file not found), causing sendSSHCommand to log ERROR three times for every build to the console. During CPIO generation, scan the base initramfs for the hook's sentinel file (etc/tetragon/tetragon.tp.d/network-monitor.yaml) and record the result in cfg.ObservabilityHook. This is accurate regardless of how the package got into the image — QEMU_ADDITIONAL_PACKAGES, QEMU_BASE_INITRAMFS, or any other mechanism. RetrieveObservabilityEvents returns immediately when ObservabilityHook is false, and treats a missing events file as an error when it is true. We can now also correctly ERROR when there _was_ a observability hook installed rather than just assuming it was not there. Store the result of that scan in a sidecar (<cpio>.observability) so we do not have to scan on cached initramfs. The sidecar is invalidated automatically when the CPIO is newer (fresh build, QEMU_ADDITIONAL_PACKAGES change, etc.). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

We should not train people or machines to ignore red ERROR messages. With this change and #2479, we have zero ERROR log entries in a successful build. Previously RetrieveObservabilityEvents always sent three `test -f` SSH commands to probe for the observability events file, even when the hook was never installed. Each probe exits non-zero (file not found), causing sendSSHCommand to log ERROR three times for every build to the console. During CPIO generation, scan the base initramfs for the hook's sentinel file (etc/tetragon/tetragon.tp.d/network-monitor.yaml) and record the result in cfg.ObservabilityHook. This is accurate regardless of how the package got into the image — QEMU_ADDITIONAL_PACKAGES, QEMU_BASE_INITRAMFS, or any other mechanism. RetrieveObservabilityEvents returns immediately when ObservabilityHook is false, and treats a missing events file as an error when it is true. We can now also correctly ERROR when there _was_ a observability hook installed rather than just assuming it was not there. Store the result of that scan in a sidecar (<cpio>.observability) so we do not have to scan on cached initramfs. The sidecar is invalidated automatically when the CPIO is newer (fresh build, QEMU_ADDITIONAL_PACKAGES change, etc.). Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

egibs approved these changes Apr 13, 2026

View reviewed changes

smoser merged commit 2f32e77 into main Apr 13, 2026
64 checks passed

smoser deleted the fix/less-errors-on-success branch April 13, 2026 23:10

smoser mentioned this pull request Apr 14, 2026

fix(observability): probe only when observability hook is installed. #2482

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(qemu): improve VM shutdown with graceful timeouts and PID safety#2479

fix(qemu): improve VM shutdown with graceful timeouts and PID safety#2479
smoser merged 1 commit intomainfrom
fix/less-errors-on-success

smoser commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

smoser commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants